DOP: Deep Optimistic Planning with Approximate Value Function Evaluation

نویسندگان

Francesco Riccio

Roberto Capobianco

Daniele Nardi

چکیده

Research on reinforcement learning has demonstrated promising results in manifold applications and domains. Still, efficiently learning effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. multi-agent systems or hyper-redundant robots). To alleviate this problem, we present DOP, a deep model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) plan effective policies. Specifically, we exploit deep neural networks to learnQ-functions that are used to attack the curse of dimensionality during a Monte-Carlo tree search. Our algorithm, in fact, constructs upper confidence bounds on the learned value function to select actions optimistically. We implement and evaluate DOP on different scenarios: (1) a cooperative navigation problem, (2) a fetching task for a 7-DOF KUKA robot, and (3) a human-robot handover with a humanoid robot (both in simulation and real). The obtained results show the effectiveness of DOP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Map-Based Strategies for Robot Navigation in Unknown Environments

Robot path planning algorithms for finding a goal in a unknown environment focus on completeness rather than optimality. In this paper, we investigate several strategies for using map information, however incomplete or approximate, to reduce the cost of the robot’s traverse. The strategies are based on optimistic, pessimistic, and average value assumptions about the unknown portions of the robo...

متن کامل

Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

Approximate dynamic programming approaches to the reinforcement learning problem are often categorized into greedy value function methods and value-based policy gradient methods. As our first main result, we show that an important subset of the latter methodology is, in fact, a limiting special case of a general formulation of the former methodology; optimistic policy iteration encompasses not ...

متن کامل

Lambda-Policy Iteration: A Review and a New Implementation

In this paper we discuss λ-policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic (also known as modified) PI, whereby each policy evaluation is done approximately, using a finite number of VI. We review the theory of the method and associat...

متن کامل

Bounded Approximations for Linear Multi-Objective Planning Under Uncertainty

Planning under uncertainty poses a complex problem in which multiple objectives often need to be balanced. When dealing with multiple objectives, it is often assumed that the relative importance of the objectives is known a priori. However, in practice human decision makers often find it hard to specify such preferences, and would prefer a decision support system that presents a range of possib...

متن کامل

Approximate Solution of the Second Order Initial Value Problem by Using Epsilon Modified Block-Pulse Function

The present work approaches the problem of achieving the approximate solution of the second order initial value problems (IVPs) via its conversion into a Volterra integral equation of the second kind (VIE2). Therefore, we initially solve the IVPs using Runge–Kutta of the forth–order method (RK), and then convert it into VIE2, and apply the εmodified block–pulse functions (εMBPFs) and their oper...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

DOP: Deep Optimistic Planning with Approximate Value Function Evaluation

نویسندگان

چکیده

منابع مشابه

Map-Based Strategies for Robot Navigation in Unknown Environments

Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

Lambda-Policy Iteration: A Review and a New Implementation

Bounded Approximations for Linear Multi-Objective Planning Under Uncertainty

Approximate Solution of the Second Order Initial Value Problem by Using Epsilon Modified Block-Pulse Function

عنوان ژورنال:

اشتراک گذاری